Design of Chinese Morphological Analyzer
نویسندگان
چکیده
This is a pilot study which aims at the design of a Chinese morphological analyzer which is in state to predict the syntactic and semantic properties of nominal, verbal and adjectival compounds. Morphological structures of compound words contain the essential information of knowing their syntactic and semantic characteristics. In particular, morphological analysis is a primary step for predicting the syntactic and semantic categories of out-of-vocabulary (unknown) words. The designed Chinese morphological analyzer contains three major functions, 1) to segment a word into a sequence of morphemes, 2) to tag the part-of-speech of those morphemes, and 3) to identify the morpho-syntactic relation between morphemes. We propose a method of using associative strength among morphemes, morpho-syntactic patterns, and syntactic categories to solve the ambiguities of segmentation and part-of-speech. In our evaluation report, it is found that the accuracy of our analyzer is 81%. 5% errors are caused by the segmentation and 14% errors are due to part-of-speech. Once the internal information of a compound is known, it would be beneficial for the further researches of the prediction of a word meaning and its function.
منابع مشابه
ACBiMA: Advanced Chinese Bi-Character Word Morphological Analyzer
While morphological information has been demonstrated to be useful for various Chinese NLP tasks, there is still a lack of complete theories, category schemes, and toolkits for Chinese morphology. This paper focuses on the morphological structures of Chinese bi-character words, where a corpus were collected based on a welldefined morphological type scheme covering both Chinese derived words and...
متن کاملLightweight Client-Side Chinese/Japanese Morphological Analyzer Based on Online Learning
As mobile devices and Web applications become popular, lightweight, client-side language analysis is more important than ever. We propose Rakuten MA, a Chinese/Japanese morphological analyzer written in JavaScript. It employs an online learning algorithm SCW, which enables client-side model update and domain adaptation. We have achieved a compact model size (5MB) while maintaining the state-of-...
متن کاملThe Construction of a Dictionary for a Two-layer Chinese Morphological Analyzer
We built a morphological analyzer, which can be freely used by anyone for research purpose. In order to build a pratical system, a dictionary with reasonable size is necessary. The initial dictionary is built from the Penn Chinese Treebank corpus v4.0 and contains only 33,438 entries. Since the initial dictionary is quite small, unknown word detection methods are applied to a huge raw text in o...
متن کاملMorphological Analyzer for Kokborok
Morphological analysis is concerned with retrieving the syntactic and morphological properties or the meaning of a morphologically complex word. Morphological analysis retrieves the grammatical features and properties of an inflected word. However, this paper introduces the design and implementation of a Morphological Analyzer for Kokborok, a resource constrained and less computerized Indian la...
متن کاملChinese Morphological Analysis with Character-level POS Tagging
The focus of recent studies on Chinese word segmentation, part-of-speech (POS) tagging and parsing has been shifting from words to characters. However, existing methods have not yet fully utilized the potentials of Chinese characters. In this paper, we investigate the usefulness of character-level part-of-speech in the task of Chinese morphological analysis. We propose the first tagset designed...
متن کامل